Bittersweet Economics: How Sugar Intake Predicts Health Expenditures Worldwide
Repository
Introduction
Not only does the rise of processed foods have an impact on our individual health, but may also unleash cascading effects on global healthcare economies.
In an era of rising rates of chronic disease and unprecedented sugar intake through foods more manufactured than grown, we sought to quantify the global impact of sugar intake on healthcare spending over time. We are exploring two variables integral to understanding this evolution of nutritional health. The first is the number of sugars and sweeteners per person (measured in g per day). This data originates from the United Nations’ Food and Agriculture Organization’s FAOSTAT database, and is compiled by Gapminder. Data from 2004 was missing, therefore, 2004 is a rough extrapolation of values calculated by Gapminder.
The second variable is the total health spending per person as measured in international dollars, represented using purchasing power parity (PPP), a currency conversion rate that equalizes different currencies by removing differences in price levels amongst countries. This data comes from the World Health Organization’s Global Health Expenditure Database (GHED).
We hypothesize that these two variables are strongly related and that increases in sugar consumption result in rising health spending worldwide per person. An article from UC Berkeley Public Health written by Berthold (2023) supports this, explaining that a local soda tax in Oakland, CA resulted in a 26.8% drop in the purchase of sugar-sweetened beverages. We are exploring whether preventing diseases associated with these sugary beverages (diabetes, heart disease, stroke, gum disease) reduces health care costs, and are extrapolating this pattern to a global sphere.
Dataset Sources
Total Health Spending per person (International $)
Source: https://www.who.int/gho/en/
Shows the average health expenditure per person, expressed in international dollars using PPP (purchasing power parity)
Sugar per person (g per day)
Source: https://www.fao.org/faostat/en/#home
Quantity of food consumption of sugars and sweeteners (g per person per day) 2004 data is a rough extrapolation
Data Cleaning
Preprocessing Steps
Joined Dataset Sample
| country | year | spending | sugar |
|---|---|---|---|
| Norway | 2010 | 8090 | 105 |
| Norway | 2008 | 8070 | 105 |
| Norway | 2009 | 7530 | 103 |
| Norway | 2007 | 7310 | 103 |
| Norway | 2006 | 6250 | 101 |
| Luxembourg | 2010 | 8180 | 137 |
Cleaning Write-up
To prepare the data for analysis, we combined two datasets: one reporting average sugar consumption per person per day and the other detailing health care spending per person, both measured by country and year. Each dataset originally contained a wide format with multiple year columns; we reshaped them so that each row represented a single country-year observation.
After reshaping, we checked for duplicate country-year combinations and found none. Each row corresponded to a unique country-year pair, confirming structural integrity prior to joining. Before merging, we also ensured consistency in country names and removed any observations lacking year or country information.
The merged dataset includes observations from 1961 to 2018, though not all countries report data for every year. The original datasets had 179 and 190 rows, respectively, reflecting how the reshaping and merging process expanded the data based on multiple years per country. The final cleaned dataset contains 2,585 rows, incorporating all meaningful observations while excluding any instance where both sugar and spending values are missing.
We validated our join by identifying unmatched country-year pairs using anti_join(), finding 7,742 country-year combinations in the sugar dataset with no match in the spending dataset. This highlights that there are large data gaps in health spending records.
Modeling the Relationship between Sugar Consumption and Health Spending by Country
Static Visualization
The following scatterplot visualizes the relationship between sugar consumption and the log of health care spending. Each point represents a single country’s average values across all recorded years. The estimated linear trend, shown in blue, highlights the direction of the association.
The upward-sloping regression line suggests that, in general, countries with higher sugar consumption tend to spend more on health care per person. However, the wide spread of the points around the line indicates that other factors beyond sugar intake likely influence health spending as well.
Animated Visualization
To examine how this relationship evolves over time, we created an animated plot showing annual trends from 1995–2010. Each frame displays data points for all countries in a single year, with the blue line representing the year-specific linear trend.
This animation shows how the relationship between sugar consumption (g/person/day) and the log of health spending (Intl $/person) has changed across countries from 1995 to 2010. Each point represents a country in a given year, and the blue line shows the trend for that year using linear regression.
While the overall association remains positive, the strength and spread of this relationship fluctuate. For example, from the late 1990s onward, some countries exhibit rapid increases in spending despite stable sugar levels, suggesting the influence of confounding factors such as economic development or healthcare policy.
Linear Model
To further examine the relationship between sugar consumption and health spending, we fit a linear regression using country-level means for sugar consumption and log-transformed per capita health spending across all years by country. This stabilizes the trendline under animation, as it abstracts away intra-year fluctuations while still incorporating an adequately long period of data collection.
We fit the following linear regression model: \[ \widehat{\text{Health Spending}} = e^{2.665 + 0.030x} \approx 14.37 \cdot e^{0.030x} \]
where \(\hat{y}\) is the log curve applied to the average healthcare spending per person per year, and \(x\) is average daily sugar intake (g/person/day).
| Estimate | Std_Error | t_value | p_value | |
|---|---|---|---|---|
| (Intercept) | 2.665 | 0.187 | 14.243 | 0 |
| avg_sugar | 0.030 | 0.002 | 14.967 | 0 |
The intercept implies that countries with near-zero sugar consumption would have an estimated health spending of approximately ( e^{2.665} ) international dollars per person per year. However, since no countries in the dataset consume zero sugar, this estimate represents an extrapolation beyond the observed domain (minimum observed value: ( 7.618 , )) and is not interpretable in isolation. The slope indicates that each additional gram of sugar consumed per person per day is associated with an estimated ( (e^{0.030} - 1) % ) increase in average health spending per person, holding all else constant.
Model Fit
Decomposition of Model Variance
The following table breaks down the total variation in the outcome variable into explained and unexplained components:
| Model Fit Variables | Value |
|---|---|
| variance in response values | 2.803 |
| variance in fitted values | 1.622 |
| variance in residuals | 1.180 |
| r-squared | 0.579 |
The R² value of 0.579 indicates that sugar consumption explains 57.9% of the variability in health spending across countries. This suggests a moderate association, but also implies that 42.1% of spending variation is likely driven by a multitude of other factors, such as economic development or healthcare policy.
Cross Validation
K Fold Distribution
The histogram displays the distribution of R² values across the 15 folds used in cross-validation. The average R² is approximately .421, indicated by the red dashed line. This suggests that, on average, the model is capturing a limited amount of variability in the validation sets.
Most R² values fall between 0 and 2, with a concentration near 0.25 to 0.75, indicating that the model often explains a moderate portion of the variability in health spending. However, the considerable variation in performance across folds suggests the model’s predictive power is sensitive to which countries are in each fold — likely reflecting regional, economic, or policy-driven differences in healthcare spending.
There is no strong evidence of overfitting: the model performs decently on held-out data in most cases. However, the inconsistency across folds suggests that additional predictors or a more flexible model may improve stability and overall fit.
References
Berthold, J. (2023, April 21). Sugary drink tax improves health, lowers health care costs. UC Berkeley School of Public Health. https://publichealth.berkeley.edu/articles/spotlight/research/sugary-drink-tax-improves-health
Food and Agriculture Organization. (2024). FAOSTAT: Sugar & sweeteners food supply data. United Nations. http://data.un.org/Data.aspx?q=Sugar&d=FAO&f=itemCode:2909
World Health Organization. (2024). Global Health Expenditure Database. https://apps.who.int/nha/database